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1. INTRODUCTION 

Language is the most prevalent means for people to communicate with one another. Slang is a local 
language spoken by a certain group of individuals [1]. Every language has its own set of Slang phrases. Slang is 
difficult to describe but it is a term that people can easily understand within a group [2]. People of a younger age 
use Slang terminology with their friends and on social media. Slang words are often utilized in movies and songs. 
People increasingly utilize a variety of social media channels to express themselves in their native language. In 
comparison to other languages, English has already had enough research done to distinguish Slang phrases. 
Because these languages lack adequate study to distinguish Slang words, users in other languages can easily 
publish Slang phrases on social media [3]. For this limitation, one can easily use any kind of Slang words in the 
media which is not a good thing for society. Our younger generations are used to utilizing social media platforms, 
and they commonly use Slang terminology in their posts, texts, and messages. Teenagers are immature, and they 
lack the understanding necessary to apply proper context on social media. On social media and other platforms, 
Slang, jargon, and hostile context will have a long-term unfavorable impact [4]. Detecting informal words on 
social media is another significant task since it frequently leads to language abuse. It is necessary to create a model 
in which Slang terms are easily identifiable while keeping in mind that no one is allowed to use unnecessary 
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Slang, offensive, or jargon phrases improperly way. Due to the limited collection of Bengali Slang words, there 
is not enough study done on Slang terms in the Bengali language. Slang phrases are now commonly used on social 
networking sites. According to 2022, Bengali is the 5" worldwide spoken language [5]. These huge-spoken people 
use Slang words continuously on social media if it continues to happen then it will negatively affect both language 
and society. Due to the limited size of the Bengali Slang word dataset, it’s very hard to build a proper model 
which will detect Slang words from social media. 

There have been enormous amounts of research work going on in natural language processing (NLP) 
[6]. Some authors have previously worked on Slang detection and their work has greatly helped research society. 
Pal and Saha [4] developed a system that can detect jargon words from e-text with the help of semi-supervised 
learning. From a text file, every word picks and checks whether it is a jargon word or not. Some drawbacks that 
should be fixed here such as they considered some words that used in the medical fields and judiciary are as 
jargon, it should be exceptional here. In another paper, Haq et al. [3] detect Slang or jargon words from social 
media in Urdu. To implement their work, firstly they convert their data into UCS transformation format-8-bit 
(UTF-8) encoding and then apply their algorithm to find results. Though they successfully find out their accuracy 
up to 72.6% and 55.21% consequently offensive and non-abusive, their dataset only carries 1,200 words which 
are very few words. Another study was done to explain the meaning of the Chinese language words Slang 
automatically by the machine [7]. For this particular purpose, dual character-level encoding was used using the 
seDCEAnn model, a focus-based neural network. To reduce human comment bias [8] a dual-class dislike speech 
(HS) dataset (HS-BAN) in Bengali contains more than 50,000 tagged comments, of which 40.17% dislike speech 
and the rest are non-dislike speech. In their dataset, some Slang words have also been included and Benchmark 
has achieved an 86.78% F1 score using the Bi-LSTM model on top of short text informal word embedding. By 
analyzing the paper of Emon et al. [9], it can be seen that they use different types of machine learning and deep 
learning-based algorithm in their paper to detect abusive Bengali text. Here the accuracy of the deep 
learning-based algorithm can be observed best. By analyzing the paper of Hossain et al. [10] about "discovering 
political Slang in readers' comments", it can be seen that they have used machine learning approaches. Specially 
they used an unsupervised algorithm for detecting creative Slang. They also developed poliSlang, an algorithm 
used to extract creative Slang words from data sets. Asghar ef al. [11] primary contribution is a system for 
recognizing and scoring internet Slang (DSIS) utilizing SentiWordNet and other lexical resources. The results of 
the comparison reveal that the suggested system outperforms the existing approaches. They suggest that their 
approach may be utilized to create an opinion lexicon for Slang terms. Alhumoud and Wazrah [12] collect data 
from Facebook comments and use data mining techniques. They also provide an algorithm that can detect 
cyberbullying in a remark. A fair 77% accuracy in recognizing one of the following cyberbullying categories: 
sexual, physical sexual, religious, political, appearance, racism, cultural, psychological, adversely praying for a 
person, and general cyberbullying. The support vector machine classifier produced the greatest results, while the 
adaptive boosting technique earned the highest precision rate of 94%. Jiamthapthaksin ef al. [13] proposed a 
method for extracting popular Thai Slang by comparing social media posts and using tokenization, as well as a 
dictionary-based method for extracting unknown words, before expanding it by using the n-gram method to 
determine what is currently trending and popular Slang words. The rest of the paper is as follows: in section two 
methodology section discussed details, the result and discussion are discussed in section three and finally, in 
section four we conclude our research. 


2. METHOD 

The method section is divided into three sections, each showing how we achieved the specific aim. 4 
subsections are as follows: a description of the data, preprocessing of data, a brief discussion about the 
classifiers, and the model evaluation. Figure | shows the working procedure of our work. 


2.1. Description of the data 

A fresh Slang phrase from a certain group or culture appears every day. However, humans don’t write 
it on a piece of paper or in a database. It's well-liked in some exclusive areas and on social media. This is why 
it's hard to translate the Bengali Slang word collect. The internet, social media, private groups, literature, and 
many different Slang communities were some of the places where these Slang terminologies were gathered. 
We have a binary class in our research, Slang, and non-Slang. We maintain proportions when we collect for 
each class. We collect around 8,110 data and store it in our corpus. The Slang class has 4,056 and the non- 
Slang class has 4,054 data. 


2.2. Data preprocessing 


Preprocessing is an important activity in text mining, NLP, and information retrieval (IR) [14]. After 
collecting raw data, data annotation is a must case as we apply the supervised model. Supervised training 
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necessitates a massive quantity of labeled data [15]. A model gives the worst outcome when we give messy 
data to a model. To find out a better result from a model, clean data is necessary. Sometimes some unnecessary 
character takes place in a sentence or a phrase that is unnecessary often. Such as “P”, “7, “@”, “|”, “)”, and so 
on. These characters are meaningless in this case, so they must be clean from every sentence or phrase. 
Table 1 demonstrates the removing unnecessary character. 


Row Data 


Data Innotation 


Remove Unnecessary Stop words 
Character Remove 


Apply TF-IDF 
Create Models 
Models Evaluation 


Figure 1. Working procedure of our work 


Table 1. Removing unnecessary character 
Original text Removing unnecessary character 


rl CB?” wera cH 


Stopwords, often known as noise words, are words that transmit just a little quantity of information 
that is usually unnecessary [16]. They must be removed before the model is trained. Some of the Bengali stop 
words are “Wz”, BOAR”, “SMA, and “Alt”. We develop a stop words corpus and when there are stop 
words found in the dataset it removes those stop words. As stop words have no meaning in the context, 
Table 2 represents removing stop words. 


Table 2. Removing stop words 
Text After removing stops words 


BoA AIT AIN 


Steaming means getting the root word of a word. It is highly effective while we work on the NLP task 
because of model learns better by getting the root words. Some of the stemming rules are being applied to get the 
root word. In the concluding state of preprocessing, sentences or words are string-type data that can't read a model. 
To overcome this problem, some techniques are available and one of them is the TF-IDF vectorizer. Here, the 
model will carry each word as a unique value by assigning each word some weights depending on frequency. By 
dividing a term's frequency in a given text by the proportion of papers in which it appears, the TF-IDF calculates 
values for each word in the document. The document in which a word appears is strongly associated with words 
with high TF-IDF values [17]. The IF-IDF vectorizer formulas are shown in (1): 


_ Number of times the term appers in a document 


TF = (1) 


Total number of words in the document 
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Number of the documents in the corpus 
IDF = log( ) 2 
8 Number of the documents int the corpus contain the term. ( ) 
TF — IDF =TF *IDF (3) 


We divided our dataset into two sections, using 80% of the data for training and the remaining 20% for testing. 


2.3. Discussion of classifiers 

Support vector machine (SVM) is an algorithmic implementation of principles from statistical 
learning theory [18] that addresses the challenge of constructing consistent approximation from data. The 
k-nearest neighbor technique differs in that it uses the data directly for classification rappearsather than first 
developing a model [19], [20]. A tree-like structure is produced by the decision tree method, which repeatedly 
divides the data set into subsets using a criterion that optimizes data separation [21], [22]. 


2.4. Model evaluation 

The confusion matrix is generally used to evaluate the model performance of multi-class or 
binary-class classification models [23]. We evaluate our models through different evaluation metrics. They are 
precision, recall, and F1 score. Model accuracy is also given for the model evaluation. 


3. RESULTS AND DISCUSSION 

This work is mainly a binary classification problem. Hence, it produced a 2x2 confusion matrix. To 
evaluate the 7 classifier models, different performance metrics uses such as model accuracy, precision, recall, 
and Fl-score. Below is the result of the models. 


3.1. Result of models 

Table 3 summarizes the results from the models and shows performance metrics measurement. 
Overall, our proposed models perform substantially well. Logistic regression gives the best model accuracy 
compared to others and XG-Boost gives a slightly lower percentage compared to others. On the other hand, 
the best Precision can be found in the XG-Boost Slang class which is 0.90, and the lowest in also the same 
model with 0.56 non-Slang class. By observing 5 models such as logistic regression, KNN, random forest, 
decision tree, and SVM they have a lower difference between the two classes in terms of precision, recall, and 
Fl-score and that's why they have high model accuracy and low variance. The best model accuracy gives 70% 
with 0.72 precision, 0.86 recall, and 0.69 Fl-score. 


Table 3. Classification report and accuracy 


Classification algorithms Class Precision Recall Fl-score Average precision Accuracy (%) 

Naive Bayes Slang 0.59 0.86 0.70 0.67 64 
Non-Slang 0.75 0.41 0.53 

Logistic Regression Slang 0.79 0.53 0.63 0.72 70 
Non-Slang 0.65 0.86 0.74 

KNN Slang 0.79 0.44 0.57 0.74 66 
Non-Slang 0.61 0.89 0.73 

Random Forest Slang 0.82 0.49 0.61 0.73 69 
Non-Slang 0.64 0.89 0.74 

Decision Tree Slang 0.79 0.51 0.62 0.71 69 
Non-Slang 0.64 0.86 0.73 

SVM Slang 0.78 0.52 0.63 0.71 69 
Non-Slang 0.64 0.86 0.74 

XG-Boost Slang 0.90 0.23 0.37 0.73 60 
Non-Slang 0.56 0.98 0.71 


3.2. Discussion on models performance 

The receiver operating characteristic (ROC) curve was used to assess the performance of the 
classification method. ROC curve is the best way to show a classifier's performance to select a suitable model 
from different models [24]. A ROC curve is a plot that gives an abstract of the performance of a binary 
classification model on the positive class. Figure 2 shows the ROC curve of our 7 classifiers model. Here, the 
X-axis indicates the false positive rate and the y-axis indicates the true positive rate. From The ROC curve, it 
is seen that the yellow line is the highest peak line compared to others. The yellow line is the logistics regression 
model line and its model accuracy is highest compared to others. On the other side, XG-Boost is the lowest 
positive case and lowest curve compared to the other 6 models. Every model in our case gives a higher true 
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positive rate, which is good for a model and thus we conclude that every model is balanced in this case and the 
model learns better from our dataset and gives a much higher model performance. Figure 2, describes the ROC 
curve of the 7 classifiers model. 

The area under the ROC curve (AUC) is a very used performance indicator for classification [25]. 
ROC AUC is the measurement of whether a model is balanced or imbalanced against a given dataset. When 
the positive case is more compared to the negative case, we called it a balanced model. The more positive case 
better the model is. The negative case with a greater number of examples and a positive case with a minority 
of examples is the imbalanced model. ROC AUC value stands between 0.0 to 1.0. More the percentage better 
the model is and at the same time learning rate is also good. Every model has a decent value number which 
stands perfect machine learning model in the NLP case. Figure 3, demonstrate the classification report. 


ROC curve 


1.0 =: Naive Bayes 
=~~ Logistic Regression 
co KNN 
Coda Decision Tree 
0.8 —— Random Forest 
—-—- SVM 
XG-Boost 


0.6 


0.4 


True Positive rate 


0.2 


0.0 


0.0 0.2 0.4 0.6 0.8 1.0 
False Positive Rate 


Figure 2. ROC curve of 7 classifiers model 


CLASSIFICATION REPORT 


mAccuracy Recall # Precision 
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Figure 3. Comparison of accuracy, recall, and precision of different classifiers 


4. CONCLUSION 

Everyone knows that Slang is a colloquial language that can have a detrimental impact on a 
community. To readily recognize Bengali Slang terms, a proper model is required. Since here, classification- 
based research has been conducted, with the model being able to detect Slang phrases automatically. The 
provided model can recognize Slang terms with 70% accuracy and 72% precision, according to the analysis of 
the results. Even though most Slang phrases are short, it is observed that extended Slang terms influence the 
model. Bert and XL-net are two models that can be used in future work. People are being alerted to an issue 
that is nothing more than a Bengali internet Slang dictionary. A perfect model for Bengali Slang detection must 
be created, with additional Slang terms being added to the dictionary so that individuals may contribute new 
Slang words with suitable meanings. Every day, new Slang terms are used, making it difficult to compile a list 
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of them all. A significant amount of Slang phrases may be gathered, and then effective algorithms like deep 
learning models and transformer-based models like Bert or XL-net can be used. 
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